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Abstract The linearly constrained matrix rank minimization problem is widely applicable in many fields such as 
control, signal processing and system identification. The tightest convex relaxation of this problem is the linearly 
constrained nuclear norm minimization. Although the latter can be cast as a semidefinite programming problem, 
such an approach is computationally expensive to solve when the matrices are large. In this paper, we propose 
fixed point and Bregman iterative algorithms for solving the nuclear norm minimization problem and prove conver- 
gence of the first of these algorithms. By using a homotopy approach together with an approximate singular value 
decomposition procedure, we get a very fast, robust and powerful algorithm, which we call FPCA (Fixed Point Con- 
tinuation with Approximate SVD), that can solve very large matrix rank minimization problems [j. Our numerical 
results on randomly generated and real matrix completion problems demonstrate that this algorithm is much faster 
and provides much better recoverability than semidefinite programming solvers such as SDPT3. For example, our 
algorithm can recover 1000 x 1000 matrices of rank 50 with a relative error of 10~ 5 in about 3 minutes by sam- 
pling only 20 percent of the elements. We know of no other method that achieves as good recoverability. Numerical 
experiments on online recommendation, DNA microarray data set and image inpainting problems demonstrate the 
effectiveness of our algorithms. 

Keywords Matrix Rank Minimization • Matrix Completion Problem • Nuclear Norm Minimization • Fixed Point 
Iterative Method • Bregman Distances • Singular Value Decomposition 
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1 Introduction 

The matrix rank minimization problem can be written as 

min rank(X) 
s.t. Xetf, 
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where X 6 M. mxn and ^ is a convex set. This model has many applications such as determining a low-order con 



trailer for a plant 12 ill and a minimum order linear system realization 11911 . and solving low-dimensional Euclidean 



embedding problems IC8I1 . 



In this paper, we are interested in methods for solving the affinely constrained matrix rank minimization problem 

(1.1) 



min rank(X) 
s.t. £/(X)=b, 



where X 6 R mx " is the decision variable, and the linear map srf : W nxn — > W and vector b € M p are given. 
The matrix completion problem 

min rank(X) 

s.t. Xij=M ih (ij)en 

is a special case of ( 11.11 ), where X and M are both mx n matrices and Q is a subset of index pairs The so 



called collaborative filtering problem 133c 13611 can be cast as a matrix completion problem. Suppose users in an 
online survey provide ratings of some movies. This yields a matrix M with users as rows and movies as columns 
whose (t, y)-th entry My is the rating given by the i-th user to the j-th movie. Since most users rate only a small 
portion of the movies, we typically only know a small subset {Mjj\(i,j) G 12} of the entries. Based on the known 
ratings of a user, we want to predict the user's ratings of the movies that the user did not rate; i.e., we want to fill in 
the missing entries of the matrix. It is commonly believed that only a few factors contribute to an individual's tastes 
or preferences for movies. Thus the rating matrix M is likely to be of numerical low rank in the sense that relatively 
few of the top singular values account for most of the sum of all of the singular values. Finding such a low-rank 
matrix M corresponds to solving the matrix completion problem (11.2l l. 



1.1 Connections to compressed sensing 

When the matrix X is diagonal, problem ( 11. lb reduces to the cardinality minimization problem 

min llxllo 
s.t. Ax = b, 

where x 6 M",A 6 M. mx " 7 b 6 K m and \\x\\q denotes the number of nonzeros in the vector x. This problem finds 
the sparsest solution to an underdetermined system of equations and has a wide range of applications in signal 



processing. This problem is NP-hard [30]. To get a more computationally tractable problem, we can replace ||x||o 
by its convex envelope. 

Definition 1 The convex envelope of a function / : ^ — > R is defined as the largest convex function g such that 
g(x) < f(x) for all x E (see e.g., Q). 

It is well known that the convex envelope of ||jc||o is ||*||i, the l\ norm of x, which is the sum of the absolute 
values of all components of x. Replacing the objective function ||x||o in ( 11.31) by ||x||i yields the so-called basis 
pursuit problem 

mm IMl! 
s.t. Ax — b. 



The basis pursuit problem has received an increasing amount of attention since the emergence of the field of com- 
pressed sensing (CS) II lc 1 1411. Compressed sensing theories connect the NP-hard problem ( 11.3b to the convex and 
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computationally tractable problem dl -4b and provide guarantees for when an optimal solution to ( 1 1.41) gives an op- 
timal solution to (11.3b . In the cardinality minimization and basis pursuit problems ( 11.3b and ( 11.4b . b is a vector of 
measurements of the signal x obtained using the sampling matrix A. The main result of compressed sensing is that 
when the signal x is sparse, i.e., k := \\x\\q <C n, we can recover the signal by solving (11.4b with a very limited num- 
ber of measurements, i.e., m <C «, when A is a Gaussian random matrix or when it corresponds to a partial Fourier 
transformation. Note that if b is contaminated by noise, the constraint Ax = b in (11.4b must be relaxed, resulting in 
either the problem 



mm ||jq|i 

s.t. ||Ajc-^|| 2 < 9 



(1.5) 



or its Lagrangian version 



mm ju | pc 1 1 1 



\Ax-bf 2 , 



(1-6) 



where 9 and fx are parameters and ||;q|2 denotes the Euclidean norm of a vector x.. Algorithms for solving (11.4b 
and its variants (11.5b and ( 11.6b have been widely investigated and many algorithms have been suggested including 
convex optimization methods (JlSSSEl) and heuristic methods (El; ESS 0)- 



1.2 Nuclear norm minimization 

The rank of a matrix is the number of its positive singular values. The matrix rank minimization (II. lb is NP-hard 
in general due to the combinational nature of the function rank(-). Similar to the cardinality function ||x||o, we can 
replace rank(X) by its convex envelope to get a convex and more computationally tractable approximation to jl.lt . 
It turns out that the convex envelope of rank(Z) on the set {X € R mx " : ||X||2 < 1} is the nuclear norm ||X||* IU8I1 . 
i.e., the nuclear norm is the best convex approximation of the rank function over the unit ball of matrices with norm 
less than one, where ||X||2 is the operator norm of X. The nuclear norm and operator norm are defined as follows. 
Definition 2 Nuclear norm and Operator norm. Assume that the matrix X has r positive singular values of 0\ > 
G2 > ■ ■ ■ > O r > 0. The nuclear norm of X is defined as the sum of its singular values, i.e., 

||X||» :=£c7,-(X). 

The operator norm of matrix X is defined as the largest singular value of X, i.e., 

||X|| 2 :=c7i(X). 

The nuclear norm is also known as Schatten 1-norm or Ky Fan norm. Using it as an approximation to rank(X) 
in dl.lb yields the nuclear norm minimization problem 

min IIXlL 

II II* Q 7 x 

s.t. */(X)=b. 

As in the basis pursuit problem, if b is contaminated by noise, the constraint srf (X) — b must be relaxed, resulting 
in either the problem 

min ||X||h, 

s.t. \\&/(X)-b\\ 2 < 9 
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or its Lagrangian version 

rmn;u||X||* + -|K(X:)-&||i, (1.8) 

where 6 and ji are parameters. 

Note that if we write X in vector form by stacking the columns of X in a single vector vec(Z) € R m ", then we 
get the following equivalent formation of ( 11.71 ): 

min||X||* 
s.t. A vec(X) = b, 

where A £ RP xm " is the matrix corresponding to the linear map . An important question is: when will an optimal 
solution to the nuclear norm minimization problem ( 11.7b gi ve an optimal solution to matrix rank minimization 
problem ( 11.11) . In response to this question, Recht et al. 132] proved that if the entries of A are suitably random, 
e.g., i.i.d. Gaussian, then with very high probability, most mx n matrices of rank r can be recovered by solving 
the nuclear norm minimization ( 11.7b or equivalently, ( 11.9b . whenever p > Cr(m + n)log(mn) , where C is a positive 
constant. 

For the matrix completion problem ( 11.2b . the corresponding nuclear norm minimization problem is 

min ILXlL 

11 11 , N (1.10) 
s.t. X r , A-/ ; ;,.;/../:. . Q. 



Candes et al. |@] proved the following result. 



Theorem 1 Let M be an n\ x ni matrix of rank r with SVD 

r 

k=\ 

where the family {uk}i<k<r is selected uniformly at random among all families of r orthonormal vectors, and 
similarly for the family {v J t}i<^<,.. Let n = max(«i,«2)- Suppose we observe m entries ofM with locations sampled 
uniformly at random. Then there are constants C and c such that if 

m > Cn 5//4 rlogn, 

the minimizer to the problem dl - 10b is unique and equal to M with probability at least 1 — cm -3 . In addition, if 
r < n x l 5 , then the recovery is exact with probability at least 1 — cm" 3 provided that 

m > Cn 6 / 5 r\ogn. 

This theorem states that a surprisingly small number of entries are sufficient to complete a low-rank matrix with 
high probability. 



Recently, this result was strengthened by Candes and Tao in 11211 . where it is proved that under certain incoher- 
ence conditions, the number of samples m that are required is only O(nrlogn). 

The dual problem corresponding to the nuclear norm minimization problem ( 11.7b is 

""f 2 , „ (LID 
s.t. |K*(z)|| 2 <l, 
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where srf* is the adjoint operator of Both ( 11.71) and dl.llb can be rewritten as equivalent semidefinite program- 
ming (SDP) problems. The SDP formulation of ( fTTTI ) is: 



min r i(Tr(Wi)+Tr(W 2 )) 



s.t. 



Wi X 

X J w 2 
srf{X)=b 



>-0 



(1.12) 



where Tr(X) denotes the trace of the square matrix X. The SDP formulation of dl.llb is: 



max b z 

z 



s.t. 



In 



(1.13) 



Thus to solve ( fTTT2l and ( TTTT31 , we can use SDP solvers such as SeDuMi |[38|] and SDPT3 [42] to solve ( fl~T2l 
and ( 11.13b . Note that the number of variables in (11.121 ) is j(m + n) (m + n + 1 ) . SDP solvers cannot usually solve a 
problem when m and n are both much larger than 100. 

Recently, Liu and Vandenberghe 12911 proposed an interior-point method for another nuclear norm approximation 
problem 



where B 6 



and 



is a linear mapping from W to '. 



srf {x) =x\A\ + x 2 A 2 H h XpA p 

<n . The equivalent SDP formulation of (11.14b is 

X n w 2 2 (Tr(Wl)+Tr(W2)) 
srf{x) -B W 2 



(1.14) 



(1.15) 



Liu and Vandenberghe i29h proposed a customized method for computing the scaling direction in an interior point 
method for solving the SDP (11.15b . The complexity of each iteration in their method was reduced from 0(p 6 ) to 
0{p A ) when m = O(p) and n = O(p); thus they were able to solve problems up to dimension m = n = 350. 

Another algorithm for solving (11.7b is due to Burer and Monteiro , (see also Rennie and Srebro Jj^ilj^l'). 
This algorithm uses the low-rank factorization X = LR J of the matrix Xel"", where L eR mxr ,R eR nxr ,r < 
min{m,n}, and solves the optimization problem 



nun 

L.R 



Ml) 



s.t. ff/(LR J )=b, 
where \\X\\f denotes the Frobenius norm of the matrix X: 



(1.16) 



i=l 



2,1/2 



= (£4) 1/2 =(Tr(XX T )) 



1/2 



It is known that as long as r is chosen to be sufficiently larger than the rank of the optimal solution matrix of 
the nuclear norm problem dl.7l ). this low-rank factorization problem is equivalent to the nuclear norm problem dl.7l ) 



6 



Shiqian Ma et al, 



(see e.g., B2IP . The advantage of this low-rank factorization formulation is that both the objective function and the 
constraints are differentiable. Thus gradient-based optimization algorithms such as conjugate gradient algorithms 
and augmented Lagrangian algorithms can be used to solve this problem. However, the constraints in this problem 
are nonconvex, so one can only be assured of obtaining a local minimizer. Also, how to choose r is still an open 
question. 

One very interesting algorithm is the so called singular value thresholding algorithm (SVT) {§!] which appeared 
almost simultaneously with our work. SVT is inspired by the linearized Bregman algorithms for compressed sensing 
and t\ -regularized problems. In |@] it is shown that SVT is efficient for large matrix completion problems. However, 
SVT only works well for very low rank matrix completion problems. For problems where the matrices are not of 
very low rank, SVT is slow and not robust therefore often fails. 

Our algorithms have some similarity with the SVT algorithm in that they make use of matrix shrinkage (see 
Section 2). However, other than that, they are greatly different. All of our methods are based on a fixed point 
continuation (FPC) algorithm which uses an operator splitting technique for solving ( 11.8b . By adopting a Monte 
Carlo approximate SVD in the FPC, we get an algorithm, which we call FPCA (Fixed Point Continuation with 
Approximate SVD), that usually gets the optimal solution to dl.lt even if the condition of Theorem Q] or those for 
the affine constrained case, are violated. Moreover, our algorithm is much faster than state-of-the-art SDP solvers 
such as SDPT3 applied to jl.l2t . Also, FPCA can recover matrices of moderate rank that cannot be recovered by 
SDPT3, SVT, etc. with the same amount of samples. For example, for matrices of size 1000 x 1000 and rank 50, 
FPCA can recover them with a relative error of 10~ 5 in about 3 minutes by sampling only 20 percent of the matrix 
elements. As far as we know, there is no other method that has as good a recoverability property. 



1.3 Outline and Notation 

Outline. The rest of this paper is organized as follows. In Section 2 we review the fixed-point continuation 
algorithm for i\ -regularized problems. In Section 3 we give an analogous fixed-point iterative algorithm for the 
nuclear norm minimization problem and prove that it converges to an optimal solution. In Section 4 we discuss 
a continuation technique for accelerating the convergence of our algorithm. In Section 5 we propose a Bregman 
iterative algorithm for nuclear norm minimization extending the approach in 14411 for compressed sensing to the 
rank minimization problem. In Section 6 we incorporate a Monte-Carlo approximate SVD procedure into our fixed- 
point continuation algorithm to speed it up and improve its ability to recover low-rank matrices. Numerical results 
for both synthesized matrices and real problems are given in Section 7. We give conclusions in Section 8. 

Notation. Throughout this paper, we always assume that the singular values are arranged in nonincreasing 
order, i.e., 0\ > 02 > . . . > O r > = ay + i = . . . = <7 m in{fli./?} . df denotes the subdifferential of the function / and 
g k = g(X k ) = £f*(£/(X k ) - b)) is the gradient of function \\\s/(X) - b\\j at the point X k . Diag(s) denotes the 
diagonal matrix whose diagonal elements are the elements of the vector s. sgn(f) is the signum function of t G M, 
i.e., 



sgn(f) := < 



+1 if f > 0, 
if f = 0, 
-1 if ? < 0, 



while the signum multifunction of t 6 R is 



SGN(f) :=d\t\ = < 



{+1} iff > 0, 
[-1,1] if? =0, 
{-1} iff < 0. 
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We use a b to denote the elementwise multiplication of two vectors a and b. We use X (k : I) to denote the submatrix 
of X consisting of the fc-th to Z-th column of X. We use K" to denote the nonnegative orthant in M". 

2 Fixed point iterative algorithm 

Our fixed point iterative algorithm for solving (11.8b is the following simple two-line algorithm: 

(Y k =X k -Tg(X k ) 

where 5 V (-) is the matrix shrinkage operator which will be defined later. 



Our algorithm J2.lt is inspired by the fixed point iterative algorithm proposed in 12411 for the l\ -regularized 
problem dl.61 ). The idea behind this algorithm is an operator splitting technique. Note that x* is an optimal solution 
to ( 11.6b if and only if 

Oe/iSGN(x*)+s*, (2.2) 

where g* = A J (Ax* — b). For any T > 0, ( 12.2b is equivalent to 

e T/iSGN(jc*) + Tg(x*). (2.3) 

Note that the operator T(-) : = TjuSGN(-) + Tg(-) on the right hand side of ( 12.31 ) can be split into two parts: T(-) = 
7i(-) - r 2 (0, where Zi(-) = TJiSGN(-) +/(•) and = /(•) - T*Q. 

Letting y = T 2 (jc* ) = x* - tA j (Ax* - b), (IQT l is equivalent to 

e Ti (x*) - y = zjiSGN(x*) +x* - y. (2.4) 

Note that ( 12.41) is actually the optimality conditions for the following convex problem 

1 , 

rninTjixIpc + — yWi- (2-5) 
x* 2 

This problem has a closed form optimal solution given by the so called shrinkage operator: 

x* =§ v (y), 

where v = T^i, and shrinkage operator S v (-) is given by 

sV(-) = sgn(-) Omax{| • | - v,0}. (2.6) 

Thus, the fixed point iterative algorithm is given by 

x k+1 =S Tfl (x k -Tg k ). (2.7) 

Hale et al. J24I1 proved global and finite convergence of this algorithm to the optimal solution of the l\ -regularized 
problem ( 11.61 ). 



Motivated by this work, we develop a fixed point iterative algorithm for (11.8b . Since the objective function in 
11.8b is convex, X* is the optimal solution to ( 11.81 ) if and only if 



(2.8) 
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where g(X*) = *f*(*/(X*) - b). Note that if the Singular Value Decomposition (SVD) of X is X = UEV J , where 
U G R mxr ,E = Diag(a) G W xr ,V G W xr , then (see e.g., 

d\\X\\, ={UV J +W : U J W =0,WV =0,\\W\\ < 1}. 

Hence, we get the following optimality conditions for d 1 .8b : 



Theorem 2 The matrix X G W x " with singular value decomposition X = UEV J , U G W xr ,E = Diag(a) G 
W xr ,V G K" xr , is optimal for the problem (11.8b if and only if there exists a matrix W G M. mxn such that 

li{UV J +W)+g(X)=0, (2.9a) 
C/ T W = 0,W = ) ||W , || 2 <1. (2.9b) 

Now based on the optimality conditions J2.8I ), we can develop a fixed point iterative scheme for solving ( 11.8b by 
adopting the operator splitting technique described at the beginning of this section. Note that J2.8I) is equivalent to 

G T^d\\X*\U+X* - {X* - Tg{X*)) (2.10) 



for any T > 0. If we let 



then ( 12.10b is reduced to 



Y* =X*-zg(X*), 



G TjU<9j|X*||* +X* -Y* , (2.11) 

i.e., X* is the optimal solution to 

min Tju||XL + i||X-y*||| (2.12) 

xeffi mx " 2 

In the following we will prove that the matrix shrinkage operator applied to Y* gives the optimal solution to 
( 12.12b . First, we need the following definitions. 



Definition 3 (Nonnegative Vector Shrinkage Operator) Assume x G E" . For any v > 0, the nonnegative vector 
shrinkage operator s v (-) is defined as 

, s _ ... _ \xi-v, ifx,-v>0 
Sylx) :=x, withx, = < 

I 0, o.w. 

Definition 4 (Matrix Shrinkage Operator) Assume X G W nxn and the SVD of X is given by X = [/Diag(a)V T , 
U G R'" x '', a G M. r + ,V G W xr . For any V > 0, the matrix shrinkage operator S v (-) is defined as 

S v {X):=UDiag{a)V J , with a = s v (ct). 

Theorem 3 Given a matrix Y G M. mxn with rank(Y) = t, let its Singular Value Decomposition (SVD) be Y = 
U Y Diag(y)Vj, where Uy G R mxt , /G M? + , Vy G R nxt , and a scalar V > 0. Then 



X:=Sy(Y)=U Y Diag(s v (r))V Y l 



(2.13) 
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is an optimal solution of the problem 



min f{X) :=v||X||* + -||X-F|||.. 

XeK"' x " 2 



(2.14) 



Proof Without loss of generality, we assume m < n. Suppose that the solution X G R mx " to problem ( 12.14b has the 
SVD X = [/Diag(CT)V T , where U G M mxr , a G W + ,V G M" x ''. Hence, X must satisfy the optimality conditions for 
( 12.141) which are 



i.e., there exists a matrix 

where U G M mx , V G W ,x {n ~ r ^> , a G 
ces, such that 



Diag(a) 



+ r , 1 1 a 1 1 oo < 1 and both U = [U , 17] and V = [V, V] are orthogonal matri- 
= v(UV J +W)+X-Y. (2.15) 



Hence, 



U 



V/ + Diag(cr) 
vDiag(a) 



V J - £/ y Diag(y)V F T = 0. 



(2.16) 



To verify that ( 12.131 ) satisfies ( 12.161 1, consider the following two cases: 

Case 1: 71 > 72 > • • • > Yt > V. In this case, choosing X as above, with r = t,U = Uy,V = Vy and a = s v (j) = 
7— Ve, where e is a vector of r ones, and choosing 6 = (i.e., W — 0) satisfies ( 12.16b . 

Case 2: 7! > 72 > . . . > 7; > v > y^+i > . . . > y t . In this case, by choosing r = fc, f/(l : t) = Uy,V(1 :t) = 
V Y) a = s v ((Yi,. ..,%)) and ffj = %+i/v, .. . ,Oi_jfc = 7,/v, a t - k+1 = ... = a m - r = 0, X and W satisfy d2J6>. 

Note that in both cases, X can be written as the form in ( 12.13b based on the way we construct X. □ 

Based on the above we obtain the fixed point iterative scheme ( 12.1b stated at the beginning of this section for 
solving problem (11.8b . 

Moreover, from the discussion following Theorem[2]we have 



Corollary 1 X* is an optimal solution to problem ( 11.8b if and only ifX* = S Tjl (h(X*)), where h(-) = /(•) — Tg(-). 



3 Convergence results 

In this section, we analyze the convergence properties of the fixed point iterative scheme J2.lt . Before we prove the 
main convergence result, we need some lemmas. 

Lemma 1 The shrinkage operator S v is non-expansive, i.e., for any Y\ and Y2 G M mx ", 

||5 v (Fi)-5 v (y 2 )|| F <||Fi-F 2 || F . (3.1) 

Moreover, 

II^-^IIf = \\S v (Y 1 )-Sy(Y 2 )\\ F <=>Yi -Y 2 =S v (Y l )-S v (Y 2 ). (3.2) 
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Proof Without loss of generality, we assume in < n. Assume SVDs of Y\ and Y 2 are Y\ = U\EV^ and Y2 = U 2 rV 2 , 
respectively, where 

= /Diag(a) <)\ Rmx = /Diag(r) \ 

i oy 'i oy 

ff= (o\,...,o s ),Oi > ... > G s > and 7 = (71, . .. ,J t ),Yi > ■■■ > Jt > 0. Note that here U\,V\,U 2 and V 2 are (full) 
orthogonal matrices; E,T G M mx ". Suppose that <7i > . . . > Cfy > V > Cfy+i > . . . > O s and 7i > . . . > 7/ > V > 
Yi+i > ■ ■ ■ > Jt, then 

?! := S V (Y X ) = UylV^h := S V (Y 2 ) = U 2 rV 2 T , 



where 



- = / Diag(a) 0\ Rmxn f = / Diag(7) 

loo 'loo, 



a = (ffi - v, . . . , a k - v) and 7= (71 - v, . . . ,7 - v). Thus, 

\\Yi -Y 2 \\l- \\Yi-Y 2 \\ 2 f = M(Yi-Y 2 ) J (Y l -Y 2 ))-7r((Y l -Y 2 ) J (Y l -Y 2 )) 

= Tr^yj _ f Tfj + yTy 2 _ p T? 2 ) _ 2Tt(Y^Y 2 - Y^Y 2 ) 



= £ of £ & v ) 2 + 1 - 1 ft - v ) 2 - 2Tr (^2 - ?i T ? 2 ) 

i=l i=l ;=1 1=1 



We note that 

Ti(Y?Y 2 - F 1 T ? 2 ) = Tr((yj - fO T (y 2 - Y 2 ) + (Fj - Fi ) T F 2 + Fi T (F 2 - F 2 )) 

= Tr(Vi (e - £) J u^u 2 (r - r )v 2 J +Vi(l - E) J uJu 2 rv 2 J + v x E T uJu 2 {r - r )V 2 J 
= Tr((z - E) T u(r - f )v T + (e- E) T urv T + E J u{r -f)v T ), 

where U = UjU 2 ,V = V^V 2 are clearly orthogonal matrices. Now let us derive an upper bound for Tr(Y^Y 2 — 
Fj T ? 2 ). It is known that an orthogonal matrix U is a maximizing matrix for the problem 

max{Tr(A[/) : U is orthogonal} 



if and only if AU is positive semidefinite matrix (see 7.4.9 in 12a]). It is also known that when AB is positive 
semidefinite, 

Tr(A5) = £a,(A5) < ^a,(A)a,(S). (3.3) 

i i 

Thus, Tr((Z - £) T £/(F -F)V T ), Tr((£ - E) J UrV J ) and Tr(£[/(F -F)V T ) achieve their maximum, if and only 
if (E - E) J U{r - F)V T , (E - E) J Ur V J and £[/(F - t)V J are all positive semidefinite. Applying £0]) to these 
three terms, we get Tr((Z -Z) T £/(F -f)V J ) < £ ; a,(£ -£)ct,-(F -f), Tr( (E -E) J UrV J ) < a,(Z -X)o;(F) 
and Tr(£C/ (F — F)V T ) < a,(£)a,(F — F). Thus, without loss of generality, assuming k<l<s<t,we have, 

||Fi-i2|||--||5v(ri)-Sv(F 2 )|| 2 ? 

s A: r / Z s /< Z 

> L cj / 2 -L( (7 '- v ) 2 +L^-L(^- v ) 2 - 2 (L <7 ' v + L Oi%+E(it-v)v+ £ oi(it-v)) 

i=l i'=l 1=1 i=l i=l i=Z+l Z=l i=k+l 

I s t s 

= £ (27v-v 2 + a, 2 -2c7,-7) + ( £ (7^+ £ yf - £ 2^7) • 

i=Jt+l i=/+l i'=Z+l i=Z+l 
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Now 

t ttf-t 2a i7i >0 

i=i+\ i=i+i i=i+\ 

since t > s and of + yf — 2(7,7; ^ 0- Also, since the function f(x) := 2yjx ~ x 2 is monotonely increasing in (— °°, yi\ 
and <7i < V < Yi, i = k+ 1, . . . , I, 

2jiV - V 2 + a 2 - 2(7,7;- > 0,i = 

Thus we get 

D(y,,y 2 ) := 11^ - Y 2 \\ 2 - ||S V (F,) -S V (Y 2 )\\ 2 F > 0; 

i.e., (|37TJ holds. 

Also, D(Y 1 ,Y 2 ) achieves its minimum value if and only if Tr((Z - E) J U(r - F)V T ), Tr((£ - E) J UrV J ) and 
Tr(£[/(r - f )V T ) achieve their maximum values simultaneously. 

Furthermore, if equality in (13.1b holds, i.e., D(5 / i , Y 2 ) achieves its minimum, and its minimum is zero, then k = I, 
s = t, and ffj = yi, i = k+1,.. .,s, which further implies E — E = F — F and Tr((X — E) J U (F — F)V T ) achieves its 
maximum. By applying the result 7.4.13 in 12611 . we get 

E-E = u(r-r)v J , 

which further implies that 

Y 1 -Y 2 = S V (Y 1 )-S V (Y 2 ). (3.4) 

To conclude, clearly ||5 v (Fi) - S V (Y 2 ) \\ F = \\Y { - Y 2 \\ F if (ES holds. □ 
The following two lemmas and theorem and their proofs are analogous to results and their proofs in Hale et al. 

A 

Lemma 2 Let stfX = Avec{X) and assume that t G (0,2/ 'X max (A~ t A)). Then the operator h(-) =/(•) — Tg(-) is non- 
expansive, i.e., \\h(X) - h(X')\\ F < \\X-X'\\ F . Moreover, h(X)-h(X') =X-X' ifandonlyif\\h{X)-h{X')\\ F = 
\\X-X'\\ F . 

Proof First, we note that since T G (0,2/A max (A T A)), -1 < A,(7- tA j A) < 1 , Vi, where A,-(J — tA j A) is the j-th 
eigenvalue of I — tA t A. Hence, 

\\h(X) - h(X')\\ F = ||(/-TA T A)(vec(X)-vec(X'))||2 < ||/ - tA t A|| 2 ||vec(X) - vec(X') || 2 
< ||vec(Z)-vec(Z')||2= \\X-X'\\ F . 

Moreover, — h(X')\\ F = \\X — X'\\ F if and only if the inequalities above are equalities, which happens if and 

only if 

(/ - TA T A)(vec(X) - vec(Z')) = vec(X) - vec(X'), 
i.e., if andonlyif/i(X)-/i(X') =X-X'. □ 

Lemma 3 Let X* be an optimal solution to problem ( 11.8b . T G (0,2/X max (A T A)) and V = TjU. Then X is also an 
optimal solution to problem ( 11.81 1 if and only if 



(3.5) 
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Proof The "only if" part is an immediate consequence of Corollary Q] For the "if" part, from Lemmas Q] and [2] 

\\X-X*\\ F = \\S v (h(X)) - S v (h(X*))\\ F < \\h(X)-h(X*)\\ P < \\X-X*\\ F . 
Hence, both inequalities hold with equality. Therefore, first using Lemma[T]and then Lemma[2]we obtain 

S v (h(X))-S v (h(X*))=h(X)-h(X*) =X-X\ 

which implies S v (h(X)) = X since S v (h(X*)) = X*. It then follows from Corollary Q] that X is an optimal solution 
to problem ( 11.8b . □ 
We now claim that the fixed-point iterations J2.lt converge to an optimal solution of problem ( 11.8b . 

Theorem 4 The sequence {X k } generated by the fixed point iterations with T 6 (0,2/X max (A T A)) converges to 
some X* £ X* , where S£* is the set of optimal solutions of problem (11.8b . 

Proof Since both S v (•) and h(-) are non-expansive, S v (h(-)) is also non-expansive. Therefore, {X k } lies in a compact 
set and must have a limit point, say X = limj^ cx ,X k j . Also, for any X* £ S£*, 

\\X k+1 -X*\\ F = \\S v (h(X k ))-S v (h(X*))\\ F < \\h(X k )-h(X*)\\ F < \\X k -X*\\ F , 

which means that the sequence {\\X k — X*\\ F } is monotonically non-increasing. Therefore, 

hm\\X k -X*\\ F = \\X-X*\\ F , (3.6) 

k— >=o 

where X can be any limit point of {X k }. By the continuity of S v (h(-)), the image of X, 

S v (h(X)) = UmS v (h(X k J)) = limZ^ +1 , 
/-»«■ ;'-»" 

is also a limit point of {X k }. Therefore, we have 

\\s v (h(X))-s v (h(x*))\\ F = \\s v (h(x))-x*\\ F = \\X-x*\\ F , 

which allows us to apply Lemma[3]to get that X is an optimal solution to problem 1 11. 81 . 
Finally, by setting X* = X £ S"* in (l3~6l l. we get that 

lim \\X k -X\\ F = lim \\X k i -X\\ F = 0, 
i.e., {X k } converges to its unique limit point X. □ 

4 Fixed point continuation 

In this section, we discuss a continuation technique (i.e., homotopy approach) for accelerating the convergence of 
the fixed point iterative algorithm d2.lt . 

4.1 Continuation 



Inspired by the work of Hale et al. 12411 . we first describe a continuation technique to accelerate the convergence 
of the fixed point iteration 02.1b . Our fixed point continuation (FPC) iterative scheme for solving dl .8b is outlined 
below. The parameter Tfy determines the rate of reduction of the consecutive jU&, i.e., 
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Fixed Point Continuation (FPC) 

- Initialize: Given X , p.>0. Select jii > /i 2 > • • • > P-l = A > 0. Set X = X Q . 

- for jU = jUi,jU2,...,ju L , do 

- while NOT converged, do 

• select t > 

• compute Y=X- {&/{X)-b), and SVD of Y, Y = [/Diag(cr) V 

• compute X = £/Diag(s TM (a))V T 

- end while 
end for 



/ijfc+i =max{ J u jfe Tfy,/l}, k=l,...,L-l 



4.2 Stopping criteria for inner iterations 

Note that in the fixed point continuation algorithm, in the k-th inner iteration we solve problem d 1 .8b for a fixed 
jj. = Hk. There are several ways to determine when to stop this inner iteration, decrease fx and go to the next inner 
iteration. The optimality conditions for ( 11.81) is given by J2.9ab and J2.9bb - Thus we can use the following condition 
as a stopping criterion: 

\\U k V?+gt/il\\ 2 -l<gtol, (4.1) 

where gtol is a small positive parameter. However, the expense of computing the largest singular value of a large 
matrix greatly decreases the speed of the algorithm. Hence, we do not use this criterion as a stopping rule for large 
matrices. Instead, we use the criterion 

W xk+l ~ xk h <xtol> (42) 



max{l,||X*|| F } 



where xtol is a small positive number, since when X k gets close to an optimal solution X*, the distance between X k 
and X k+l should become very small. 



4.3 Debiasing 

Debiasing is another technique that can improve the performance of FPC. Debiasing has been used in compressed 
sensing algorithms for solving jl.4\ and its variants, where debiasing is performed after a support set J? has been 
tentatively identified. Debiasing is the process of solving a least squares problem restricted to the support set , 
i.e., we solve 

min \\AyXy — b\\2, (4.3) 

where Aj is a submatrix of A whose columns correspond to the support index set J? , and x j; is a subvector of x 
corresponding to 

Our debiasing procedure for the matrix completion problem differs from the procedure used in compressed 
sensing since the concept of a support set is not applicable. When we do debiasing, we fix the matrices U k and V k 
in the singular value decomposition of X and then solve a least squares problem to determine the correct singular 
values a € W + ; i.e., we solve 

min \\£/{U k Diag(a)V kT )-b\\ 2 , (4.4) 
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where r is the rank of current matrix X . Because debiasing can be costly, we use a rule proposed in [43] to decide 
when to do it. In the continuation framework, we know that in each subproblem with a fixed ju, \\Xk + \ — X^Wf 
converges to zero, and ||g||2 converges to fi when X^ converges to the optimal solution of the subproblem. We 
therefore choose to do debiasing when jgl^/H^+i — X^Wf becomes large because this indicates that the change 
between two consecutive iterates is relatively small. Specifically, we call for debiasing in the solver FPC3 (see 
Section 7) when ||g||2/||X£ + i — X^f > 10. 



5 Bregman iterative algorithm 

Algorithm FPC is designed to solve fll.8l l, an optimal solution of which approaches an optimal solution of the 
nuclear norm minimization problem ( 11.7b as jj. goes to zero. However, by incorporating FPC into a Bregman iterative 
technique, we can solve ( 11.7b by solving a limited number of instances of ( 11.8b . each corresponding to a different b. 
Given a convex function /(•), the Bregman distance of the point u from the point v is defined as 

D p j(u,v) :=J(u)-J(v)- < p,u-v >, (5.1) 

where p 6 dJ(v) is some subgradient in the subdifferential of J at the point v. 

Bregman iterative regularization was introduced by Osher et al. in the context of image processing 13 ill . Specif- 
ically, in 13 ill , the Rudin-Osher-Fatemi 13411 model 

a = argmin i(J u j \Vu\ + -\\u — b\\% (5.2) 

was extended to an iterative regularization model by replacing the total variation functional 

J(u) = p.TV{u) = )i I |Vm|, 

by the Bregman distance with respect to J(u). This Bregman iterative regularization procedure recursively solves 

u k+l <-mmD'f {u,u k ) + l-\\u-b\\l (5.3) 
u 2 



for k = 0, 1, . . . starting with u = and p = 0. Since (15.31) is a convex programming problem, the optimality 
conditions are given by € dJ(u k+1 ) — p k + u k+x — b, from which we get the update formula for p k+x : 

p k+1 :=p k + b-u k+l . (5.4) 



Therefore, the Bregman iterative scheme is given by 



h. h 1 

2 lr " " ,u (5.5) 



u k+1 ^-min„D^ (u,u k ) + -\\u- b\\l 
p k+i = p k + b _ u k+i 



Interestingly, this turns out to be equivalent to the iterative process 

b k+x =b+(b k -u k ) 

^+ 1 ^min M /( M ) + I|| M -^+ 1 |||, (5 - 6) 
which can be easily implemented using existing algorithms for ( 15.2b with different inputs b. 
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Subsequently, Yin et al. 14411 proposed solving the basis pursuit problem ( 11.4b by applying the Bregman iterative 
regularization algorithm to 

nrin/(jc) + -||Ae-fe||2 (5.7) 
x 2 

for J(x) = , and obtained the following two equivalent iterative schemes analogous to ( I5.5b and d5.6l l. respec- 

tively: 

- Version 1: 

- for k = 0, 1, ... do 

- x k+l <— argmin ( D'f (x,x k ) + - \\Ax ~ b\\l 

- p k+l ^p k -A J (Ax k+l -b) 

- Version 2: 

- b° ^0,jc° <-0, 

- for k = 0, 1, ... do 

- b k+l t-b + itf-Ax*) 



- x k+1 <— argmin^/(jc) + - \\Ax-b k+1 \\\. 



One can also use the Bregman iterative regularization algorithm applied to the unconstrained problem ( 11.8b to 
solve the nuclear norm minimization problem J 1.71) . That is, one iteratively solves (11.8b by 

X k+1 ^xmnDf(X,X k ) + h^/(X)-b\\l, (5.8) 

and updates the subgradient p k+l by 

p k+l := p k — g/*(A(X k+1 ) — b), (5.9) 

where J(X) = ju||X||*. 

Equivalently, one can also use the following iterative scheme: 



b k+1 ^b + (b k -^(X k )) 



1„ ,*+l„2 ( 5 - 1Q ) 



X k+1 ^argmm x ii\\X\\, + -\\£/(X) 
Thus, our Bregman iterative algorithm for nuclear norm minimization (11.7b can be outlined as follows. The last 



Bregman Iterative Algorithm 

- b°^0,X°<-0, 

- for k = 0, 1, ... do 

- b k+l ^b + {b k -#f(X k )), 

- X k+X ^sxgmrxxii\\X\\ )f + -\\^{X)-b k 



step can be solved by Algorithm FPC. 
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6 An approximate SVD based FPC algorithm: FPCA 

Computing singular value decompositions is the main computational cost in Algorithm FPC. Consequently, instead 
of computing the full SVD of the matrix Y in each iteration, we implemented a variant of algorithm FPC in which 
we compute only a rank-r approximation to Y, where r is a predetermined parameter. We call this approximate SVD 
based FPC algorithm (FPCA). This approach greatly reduces the computational effort required by the algorithm. 
Specifically, we compute an approximate SVD by a fast Monte Carlo algorithm: the Linear Time SVD algorithm 
developed by Drineas et al. H 1 '711 - For a given matrix A G M mx ", and parameters c S7 k s G Z + with 1 < k s < c s < 
n and {pi}" = \, Pi > 0,T!! = \Pi = 1> this algorithm returns an approximation to the largest k s singular values and 
corresponding left singular vectors of the matrix A in linear 0(m + n) time. The Linear Time SVD Algorithm is 
outlined below. 

Linear Time Approximate SVD Algorithm[17] 

- Input: A G W" xn , c s ,k s G Z+ s.t.l < k, < c s < n, {pi}1 =1 s.t.pt > 0,£? =1 />; = 1- 

- Output: H k e W nxk * and a t (C),t = 1, .. .,k s . 

- For t = 1 to c,s, 

• Pick i t G 1 , . . . , n with Pr[i t = a] = p a ,Ct = 1 , . . . , n. 

• SetC« = A yc^-. 

- Compute C T C and its SVD; say C T C = YftU a K C W T ■ 

- Compute h' = Cy' /o t (C) for t = 1, . . . ,k s . 

- Return H ks , where H%' = h' , and O t (C) , t = 1 , . . . , k s . 



The outputs o t (C) , t = l,...,k s are approximations to the largest k s singular values and ,t = 1 , . . . , k are 
approximations to the corresponding left singular vectors of the matrix A. Thus, the SVD of A is approximated by 

A^A ks :=H k pmg(o(C))(A T H k pmg(l/e(C)) J . 



Drineas et al. 11711 prove that with high probability, the following estimate holds for both t, = 2 and t, = F : 



\\A-A ks \\\< min \\A- D\\j+ poly(k s ,l/c s )\\A\\ 2 F , (6.1) 
* D:rank(£>)<Ar. t * 



where poly(k s , 1 /c s ) is a polynomial in k s and 1 /c s . Thus, A ks is a approximation to the best rank-£ s approximation 
to A. (For any matrix M G R mx " with SVD M = Yli=\ OjUjvJ , where C7 X > . . . > 0> > 0, m, G R m ,V; G W, the best 
rank-k approximation to M is given by M = Y^=\ &i u i v J)- 

Note that in this algorithm, we compute an exact SVD of a smaller matrix C T C G M ClXe ' ! . Thus, c s determines the 
speed of this algorithm. If we choose a large c s , we need more time to compute the SVD of C T C. However, the larger 
c s is, the more likely are the a t (C) , t = 1 , . . . , k s to be close to the largest k s singular values of the matrix A since the 
second term in the right hand side of i6. II ) is smaller. In our numerical experiments, we found that we could choose 
a relatively small c s so that the computational time was reduced without significantly degrading the accuracy. In our 
tests, we obtained very good results by choosing c s = 2r m — 2, where r m = [(m + n— a/ (m + n) 2 —Ap) j1\ is, for a 
given number of entries sampled, the largest rank of m x n matrices for which the matrix completion problem has a 
unique solution. 

There are many ways to choose the probabilities In our numerical experiments in Section 7, we used the 



simplest one, i.e., we set all /?, equal to 1/n. For other choices of see 11711 and the references therein. 

In our numerical experiments, we set k s using the following procedure. In the &-th iteration, when computing 
the approximate SVD of Y k = X k — Tg k , we set k s equal to the number of components in sjt_i that are no less than 
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Sk s max{.S/t_ i } , where is a small positive number and max{s^ i } is the largest component in the vector [ used 
to form X k = C/^-'DiagfsVOV*- 1 . Note that k s is non-increasing in this procedure. However, if k s is too small at 
some iteration, the non-expansive property (13.1b of the shrinkage operator S v may be violated since the approximate 
SVD is not a valid approximation when k s is too small. Thus, in algorithm FPCA, if (13.1b is violated 10 times, we 
increase k s by 1 . Our numerical experience indicates that this technique makes our algorithm very robust. 

Our numerical results in Section 7 show that this approximate SVD based FPC algorithm: FPCA, is very fast, 
robust, and significantly outperforms other solvers (such as SDPT3) in recovering low-rank matrices. This result is 
not surprising. One reason for this is that in the approximate SVD algorithm, we compute a low-rank approximation 
to the original matrix. Hence, the iterative matrices produced by our algorithm are more likely to be of low-rank 
than an exact solution to the nuclear norm minimization problem (II. 10b , or equivalently, to the SDP ( 11.12b , which 
is exactly what we want. Some convergence/recoverability properties of a variant of FPCA, which uses a truncated 
SVD rather than a randomized SVD at each step, are discussed in 12311 . 



7 Numerical results 

In this section, we report on the application of our FPC, FPCA and Bregman iterative algorithms to a series of matrix 
completion problems of the form dl.2b to demonstrate the ability of these algorithms to efficiently recover low-rank 
matrices. 

To illustrate the performance of our algorithmic approach combined with exact and approximate SVD algo- 
rithms, different stopping rules, and with or without debiasing, we tested the following solvers. 

- FPC1. Exact SVD, no debiasing, stopping rule: (14.2b . 

- FPC2. Exact SVD, no debiasing, stopping rule: (l4~Tb and (l4~2l 

- FPC3. Exact SVD with debiasing, stopping rule: ( 14.2b . 

- FPCA. Approximate SVD, no debiasing, stopping rule: (14.2b . 

- Bregman. Bregman iterative method using FPC2 to solve the subproblems. 



7.1 FPC and Bregman iterative algorithms for random matrices 

In our first series of tests, we created random matrices M £ ffi mx " with rank r by the following procedure: we first 
generated random matrices Mi € W xr and Mr € W xr with i.i.d. Gaussian entries and then set M = MlMJ. We 
then sampled a subset £2 of p entries uniformly at random. For each problem with m x n matrix M, measurement 
number p and rank r, we solved 50 randomly created matrix completion problems. We use SR = p/(mn), i.e., the 
number of measurements divided by the number of entries of the matrix, to denote the sampling ratio. We also list 
FR = r(m + n — r)/p, i.e. the dimension of the set of rank r matrices divided by the number of measurements, in the 
tables. Note that if FR > 1, then there is always an infinite number of matrices with rank r with the given entries, so 
we cannot hope to recover the matrix in this situation. We use r m to denote the largest rank such that FR < 1, i.e., 
r m = [(m + n— *J (m + n) 2 — Ap) /2J . We use NS to denote the number of matrices that are recovered successfully. 
We use AT to denote the average time (seconds) for the examples that are successfully solved. 
We used the relative error 

_\\X opt -M\\ P 



\\M\\ F 

to estimate the closeness of X opt to M, where X opt is the "optimal" solution to ( 11. 101 ) produced by ou r alg orithms. 
We declared M to be recovered if the relative error was less than 10~ 3 , which is the criterion used in 1 32 1 and 
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We use RA,RU,RL to denote the average, largest and smallest relative error of the successfully recovered matrices, 
respectively. 

We summarize the parameter settings used by the algorithms in Table [T] We use /,„ to denote the maximum 
number of iterations allowed for solving each subproblem in FPC, i.e., if the stopping rules ( 14.2b (and (14.1b ) are not 
satisfied after /„, iterations, we terminate the subproblem and decrease fi to start the next subproblem. 



Table 1 Parameters in Algorithm FPC 



FPC 


p. = 10" 8 ,?7 M = 1/4, jii =T}f 1 \\s^*b\\ 2 ,T= \,Xtol = W~ w ,gtol = 10- 4 ,/ m =500,X = 


Approx SVD 


c s = 2r m - 2, e ks = \0~ 2 ,Pi = 1/«,V; 



All numerical experiments were run in MATLAB 7.3.0 on a Dell Precision 670 workstation with an Intel 
Xeon(TM) 3.4GHZ CPU and 6GB of RAM. 

The comparisons between FPC1, FPC2, FPC3 and SDPT3 for small matrix completion problems are presented 
in Table|2] From Table[2]we can see that FPC1 and FPC2 achieve almost the same recoverability and relative error, 
which means that as long as we set xtol to be very small (like 10~ 10 ), we only need to use ( 14.2b as the stopping 
rule for the inner iterations. That is, use of stopping rule ( 14.11 ) does not affect the performance of the algorithm. Of 
course FPC2 costs more time than FPC1 since more iterations are sometimes needed to satisfy the stopping rules in 
FPC2. While FPC3 can improve the recoverability, it costs more time for performing debiasing. SDPT3 seems to 
obtain more accurate solutions than FPC1, FPC2 or FPC3. 

Table 2 Comparisons of FPC 1, FPC2, FPC3 and SDPT3 for randomly created small matrix completion problems (m=n=40, p=800, 
SR=0.5) 



r 


FR 


Solver 


NS 


AT 


RA 


RU 


RL 


1 


0.0988 


FPC1 


50 


1.81 


1.67e-9 


1.22e-8 


6.06e-10 






FPC2 


50 


3.61 


1.32e-9 


1.20e-8 


2.55e-10 






FPC3 


50 


16.81 


1.06e-9 


2.22e-9 


5.68e-10 






SDPT3 


50 


1.81 


6.30e-10 


3.46e-9 


8.72e-ll 


2 


0.1950 


FPC1 


42 


3.05 


1.01e-6 


4.23e-5 


8.36e-10 






FPC2 


42 


17.97 


1.01e-6 


4.23e-5 


2.78e-10 






FPC3 


49 


16.86 


1.26e-5 


3.53e-4 


7.62e-10 






SDPT3 


44 


1.90 


1.50e-9 


7.18e-9 


1.82e-10 


3 


0.2888 


FPC1 


35 


5.50 


9.72e-9 


2.85e-8 


1.93e-9 






FPC2 


35 


20.33 


2.17e-9 


1.41e-8 


3.88e-10 






FPC3 


42 


16.87 


3.58e-5 


7.40e-4 


1.34e-9 






SDPT3 


37 


1.95 


2.66e-9 


1.58e-8 


3.08e-10 


4 


0.3800 


FPC1 


22 


9.08 


7.91e-5 


5.46e-4 


3.57e-9 






FPC2 


22 


18.43 


7.91e-5 


5.46e-4 


4.87e-10 






FPC3 


29 


16.95 


3.83e-5 


6.18e-4 


2.57e-9 






SDPT3 


29 


2.09 


1.18e-8 


7.03e-8 


7.97e-10 


5 


0.4688 


FPC1 


1 


10.41 


2.10e-8 


2.10e-8 


2.10e-8 






FPC2 


1 


17.88 


2.70e-9 


2.70e-9 


2.70e-9 






FPC3 


5 


16.70 


1.78e-4 


6.73e-4 


6.33e-9 






SDPT3 


8 


2.26 


1.83e-7 


8.12e-7 


2.56e-9 


6 


0.5550 


FPC1 

















FPC2 

















FPC3 

















SDPT3 


1 


2.87 


6.58e-7 


6.58e-7 


6.58e-7 
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To illustrate the performance of our Bregman iterative algorithm, we compare the results of using it versus using 
FPC2 in Table [3] From our numerical experience, for those problems for which the Bregman iterative algorithm 
greatly improves the recoverability, the Bregman iterative algorithm usually takes 2 to 3 iterations. Thus, in our 
numerical tests, we fixed the number of subproblems solved by our Bregman algorithm to 3. Since our Bregman 
algorithm achieves as good a relative error as the FPC algorithm, we only report how many of the examples that 
are successfully recovered by FPC, are improved greatly by using our Bregman iterative algorithm. In Table [3] 
NIM is the number of examples that the Bregman iterative algorithm outperformed FPC2 greatly (the relative errors 
obtained from FPC2 were at least 10 4 times larger than those obtained by the Bregman algorithm). From Table|3]we 
can see that for more than half of the examples successfully recovered by FPC2, the Bregman iterative algorithm 
improved the relative errors greatly (from [10" 10 , 1(T 9 ] to [10 -16 , 10" 15 ]). Of course the run times for the Bregman 
iterative algorithm were about three times that for algorithm FPC2, since the former calls the latter three times to 
solve the subproblems. 



Table 3 Numerical results for the Bregman iterative method for small matrix completion problems (m=n=40, p=800, SR=0.5) 



Problem 




FPC2 


Bregman 


r FR 


NIM (NS) 


RU RL 


RU RL 


1 0.0988 

2 0.1950 

3 0.2888 

4 0.3800 


32 (50) 
29 (42) 
24 (35) 
10 (22) 


2.22e-9 2.55e-10 
5.01e-9 2.80e-10 
2.77e-9 3.88e-10 
5.51e-9 4.87e-10 


1.87e-15 3.35e-16 
2.96e-15 6.83e-16 
2.93e-15 1.00e-15 
3.11e-15 1.30e-15 



In the following, we discuss the numerical results obtained by our approximate SVD based FPC algorithm 
(FPCA). We will see from these numerical results that FPCA achieves much better recoverability and is much faster 
than any of the solvers FPC1, FPC2, FPC3 or SDPT3. 

We present the numerical results of FPCA for small (m=n=40) and medium (m=n=100) problems in Tables |4] 
and Irrespectively. Since we found that xtol = 10~ 6 is small enough to guarantee very good recoverability, we set 
xtol — 10~ 6 in algorithm FPCA and used only ( 14.2b as stopping rule for the inner iterations. From these tables, 
we can see that our FPCA algorithm is much more powerful than SDPT3 for randomly created matrix completion 
problems. When m = n = 40 and p = 800, and the rank r was less than or equal to 8, FPCA recovered the matrices 
in all 50 examples. When rank r = 9, it failed on only one example. Even for rank r = 10, which is almost the 
largest rank that satisfies FR < 1, FPCA still recovered the solution in more than 60% of the examples. However, 
SDPT3 started to fail to recover the matrices when the rank r = 2. When r = 6, there was only one example out of 50 
where the correct solution matrix was recovered. When r > 7, none of the 50 examples could be recovered. For the 
medium sized matrices (m = n= 100) we used p = 2000, which is only a 20% measurement rate, FPCA recovered 
the matrices in all 50 examples when r < 6. For r = 7, FPCA recovered the matrices in most of the examples (49 
out of 50). When r = 8, more than 60% of the matrices were recovered successfully by FPCA. Even when r = 9, 
FPCA still recovered 1 matrices. However, SDPT3 could not recover all of the matrices even when the rank r = 1 
and none of the matrices were recovered when r > 4. When we increased the number of measurements to 3000, 
we recovered the matrices in all 50 examples up to rank r = 12. When r = 13, 14, we still recovered most of them. 
However, SDPT3 started to fail for some matrices when r = 3. When r > 8, SDPT3 failed to recover any of the 
matrices. We can also see that for the medium sized problems, FPCA was much faster than SDPT3. 
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Table 4 Numerical results for FPCA and SDPT3 for randomly created small matrix completion problems (m=n=40, p=800, SR=0.5) 



Problems 


FPCA 


SDPT3 


r 


FR 


NS 


AT 


RA 


RU 




RL 




NS 


AT 


RA 


RU 


RL 


1 


0.0988 


50 


3.49 


3.92e-7 


1.43e 


6 


2.72e 


7 


50 


1.84 


6.30e-10 


3.46e-9 


8.70e-ll 


2 


0.1950 


50 


3.60 


1.44e-6 


7.16e 


6 


4.4 le 


7 


44 


1.93 


1.50e-9 


7.18e-9 


1.82e-10 


3 


0.2888 


50 


3.97 


1.91e-6 


4.07e 


6 


9.28e 


7 


37 


1.99 


2.66e-9 


1.58e-8 


3.10e-10 


4 


0.3800 


50 


4.03 


2.64e-6 


8.14e 


6 


1.54e 


6 


29 


2.12 


1.18e-8 


7.03e-8 


8.00e-10 


5 


0.4688 


50 


4.16 


3.40e-6 


7.62e 


6 


1.52e 


6 


8 


2.30 


1.83e-7 


8.12e-7 


2.60e-9 


6 


0.5550 


50 


4.45 


4.08e-6 


7.62e 


6 


2.26e 


6 


1 


2.89 


6.58e-7 


6.58e-7 


6.58e-7 


7 


0.6388 


50 


4.78 


6.04e-6 


1.57e- 


5 


2.52e 


6 













8 


0.7200 


50 


4.99 


8.48e-6 


5.72e- 


5 


3.72e- 


6 













9 


0.7987 


49 


5.73 


2.58e-5 


5.94e 


4 


5.94e 


6 













10 


0.8750 


30 


7.20 


8.64e-5 


6.04e 


4 


8.48e 


6 













11 


0.9487 





























Table 5 Numerical results for FPCA and SDPT3 for randomly created medium matrix completion problems (m=n=100) 



Problems 


FPCA 


SDPT3 


n 

V 


r 


SR 


FR 


NS 


AT 


RA 


RU 


RL 


NS 


AT 


RA 


RU 


RL 


2000 


1 


0.2 


0.0995 


50 


4.93 


5.80e-6 


1.53e-5 


2.86e-6 


47 


15.10 


1.55e-9 


1.83e-8 


1.40e-10 


2000 


2 


0.2 


0.1980 


50 


5.26 


6.10e-6 


9.36e-6 


4.06e-6 


31 


16.02 


7.95e-9 


8.69e-8 


5.20e-10 


2000 


3 


0.2 


0.2955 


50 


5.80 


7.48e-6 


1.70e-5 


4.75e-6 


13 


19.23 


1.05e-4 


9.70e-4 


9.08e-10 


2000 


4 


0.2 


0.3920 


50 


9.33 


1.09e-5 


5.14e-5 


6.79e-6 













2000 


5 


0.2 


0.4875 


50 


5.42 


1.61e-5 


8.95e-5 


8.12e-6 













2000 


6 


0.2 


0.5820 


50 


7.02 


2.62e-5 


7.07e-5 


8.72e-6 













2000 


7 


0.2 


0.6755 


49 


8.69 


7.69e-5 


5.53e-4 


l.lle-5 













2000 


8 


0.2 


0.7680 


32 


10.94 


1.97e-4 


8.15e-4 


2.29e-5 













2000 


9 


0.2 


0.8595 


1 


11.75 


4.38e-4 


4.38e-4 


4.38e-4 













2000 


10 


0.2 


0.9500 
























3000 


1 


0.3 


0.0663 


50 


7.73 


1.97e-6 


3.15e-6 


1.22e-6 


50 


36.68 


2.01e-10 


9.64e-10 


7.52e-ll 


3000 


2 


0.3 


0.1320 


50 


7.85 


2.68e-6 


8.41e-6 


1.44e-6 


50 


36.50 


1.13e-9 


2.97e-9 


1.77e-10 


3000 


3 


0.3 


0.1970 


50 


8.10 


2.82e-6 


4.38e-6 


1.83e-6 


46 


38.50 


1.28e-5 


5.89e-4 


2.10e-10 


3000 


4 


0.3 


0.2613 


50 


8.94 


3.57e-6 


5.62e-6 


2.64e-6 


42 


41.28 


4.60e-6 


1.21e-4 


4.53e-10 


3000 


5 


0.3 


0.3250 


50 


9.12 


4.06e-6 


8.41e-6 


2.78e-6 


32 


43.92 


7.82e-8 


1.50e-6 


1.23e-9 


3000 


6 


0.3 


0.3880 


50 


9.24 


4.84e-6 


9.14e-6 


3.71e-6 


17 


49.60 


3.44e-7 


4.29e-6 


3.68e-9 


3000 


7 


0.3 


0.4503 


50 


9.41 


5.72e-6 


1.09e-5 


3.96e-6 


3 


59.18 


1.43e-4 


4.28e-4 


1.57e-7 


3000 


8 


0.3 


0.5120 


50 


9.62 


6.37e-6 


1.90e-5 


4.43e-6 













3000 


9 


0.3 


0.5730 


50 


10.35 


6.32e-6 


1.60e-5 


4.56e-6 













3000 


10 


0.3 


0.6333 


50 


10.93 


8.45e-6 


3.79e-5 


5.59e-6 













3000 


11 


0.3 


0.6930 


50 


11.58 


1.41e-5 


6.84e-5 


6.99e-6 













3000 


12 


0.3 


0.7520 


50 


12.17 


1.84e-5 


1.46e-4 


8.84e-6 













3000 


13 


0.3 


0.8103 


48 


15.24 


5.12e-5 


6.91e-4 


1.25e-5 













3000 


14 


0.3 


0.8680 


39 


18.85 


2.35e-4 


9.92e-4 


2.05e-5 













3000 


15 


0.3 


0.9250 
























3000 


16 


0.3 


0.9813 

























7.2 Comparison of FPCA and SVT 

In this subsection we compare our FPCA algorithm against the SVT algorithm proposed in J8|]. The SVT code 
is downloaded from \http://svt. cal tech. edu We constructed two sets of test problems. One set contained "easy" 
problems. These problems are "easy" because the matrices are of very low-rank compared to the matrix size and the 
number of samples, and hence they are easy to recover. For all problems in this set, FR was less than 0.34. The other 
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set contained "hard" problems, i.e., problems that are very challenging. These problems involved matrices that are 
not of very low rank and for which sampled a very limited number of entries. For this set of problems, FR ranged 
from 0.40 to 0.87. The maximum iteration number in SVT was set to be 1000. All other parameters were set to 
their default values in SVT. The parameters of FPCA were set somewhat loosely for easy problems. Specifically, we 
set jU = 10~ 4 , xtol = 10~ 4 , T = 2,I m = 10 and all other parameters were set to the values given in Table [TJ Relative 
errors and times were averaged over 5 runs. In this subsection, all test matrices were square, i.e., m = n. 

Table 6 Comparison of FPCA and SVT on easy problems 



Problems 


FPCA 


SVT 


n 


r 


P 


SR 


FR 


rel.err. 


time 


rel.err. 


time 


100 


10 


5666 


0.57 


0.34 


4.27e-5 


0.39 


1.64e-3 


30.40 


200 


10 


15665 


0.39 


0.25 


6.40e-5 


1.38 


1.90e-4 


9.33 


500 


10 


49471 


0.20 


0.20 


2.48e-4 


8.01 


1.88e-4 


23.77 


1000 


10 


119406 


0.12 


0.17 


5.04e-4 


18.49 


1.68e-4 


41.81 


1000 


50 


389852 


0.39 


0.25 


3.13e-5 


120.64 


1.63e-4 


228.79 


1000 


100 


569900 


0.57 


0.33 


2.26e-5 


177.17 


1.71e-4 


635.15 


5000 


10 


597973 


0.02 


0.17 


1.58e-3 


1037.12 


1.73e-4 


121.39 


5000 


50 


2486747 


0.10 


0.20 


5.39e-4 


1252.70 


1.59e-4 


1375.33 


5000 


100 


3957533 


0.16 


0.25 


2.90e-4 


2347.41 


1.74e-4 


5369.76 



From Table [6] we can see that for the easy problems except for one problem which is exceptionally sparse as 
well as having low rank, FPCA was much faster and usually provided more accurate solutions than SVT. 

For hard problems, all parameters of FPCA were set to the values given in TableQ] except that we set xtol — 10 
since this value is small enough to guarantee very good recoverability. Also, for small problems ( i.e., max{m,n} < 
1000 ), we set I m = 500; and for large problems ( i.e., max{m,«} > 1000 ), we set /„, = 20. We use " — " to indicate 
that the algorithm either diverges or does not terminate in one hour. Relative errors and times were averaged over 5 
runs. 

Table 7 Comparison of FPCA and SVT on hard problems 



Problems 


FPCA 


SVT 


n r SR FR 


rel.err. time 


rel.err. time 


40 9 0.5 0.80 


1.21e-5 5.72 


5.01e-l 3.05 


100 14 0.3 0.87 


1.32e-4 19.00 


8.31e-l 316.90 


1000 20 0.1 0.40 


2.46e-5 116.15 




1000 30 0.1 0.59 


2.00e-3 128.30 




1000 50 0.2 0.49 


1.04e-5 183.67 





From Table 13 we can see that for the hard problems, SVT either diverged or did not solve the problems in less 
than one hour, or it yielded a very inaccurate solution. In contrast, FPCA always provided a very good solution 
efficiently. 

We can also see that FPCA was able to efficiently solve large problems (m =n= 1000) that could not be solved 
by SDPT3 due to the large size of the matrices and the large number of constraints. 
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7.3 Results for real data matrices 

In this section, we consider matrix completion problems based on two real data sets: the Jester joke data set ||22 
and the DNA data set $3^\ . The Jester joke data set contains 4.1 million ratings for 100 jokes from 73,421 users 



and is available on the website http://www.ieor.berkeley.edu/~Egoldberg/jester-data/ Since the number of jokes is 



only 100, but the number of users is quite large, we randomly selected n u users to get a modestly sized matrix for 



testing purpose. As in 13711 . we randomly held out two ratings for each user. Since some entries in the matrix are 



missing, we cannot compute the relative error as we did for the randomly created matrices. Instead, we computed 



the Normalized Mean Absolute Error (NMAE) as in [22] and [37]. The Mean Absolute Error (MAE) is defined as 



where r l - and f*. are the withheld and predicted ratings of movie j by user 2, respectively, for j = i\,i2- NMAE is 
defined as 

MAF 

NMAE = , (7.2) 

'"max — r min 

where r m ; n and r max are lower and upper bounds for the ratings. Since all ratings are scaled to the range [— 10, +10], 
we have r m ; n = — 10 and r max = 10. 

The numerical results for the Jester data set using FPC1 and FPCA are presented in Tables [8] and [9] respectively. 
In these two tables, (7 max and <7 m j n are the largest and smallest positive singular values of the recovered matrices, 
and rank is the rank of the recovered matrices. The distributions of the singular values of the recovered matrices are 
shown in FiguresQ]and[2] From Tables[8]and|9]we can see that by using FPC1 and FPCA to recover these matrices, 



we can get relatively low NMAEs, which are comparable to the results shown in 13711 and 12211 



Table 8 Numerical results for FPC1 for the Jester joke data set 



num.user 


num.samp 


samp.ratio 


rank 


Cmax 




NMAE 


Time 


100 


7172 


0.7172 


79 


285.65 


3.49e-4 


0.1727 


34.30 


1000 


71152 


0.7115 


100 


786.37 


38.43 


0.1667 


304.81 


2000 


140691 


0.7035 


100 


1.1242e+3 


65.06 


0.1582 


661.66 



Table 9 Numerical results for FPCA for the Jester joke data set (c s is the number of rows we picked for the approximate SVD) 

num.user num.samp samp.ratio e^ s c s rank <7 max cr m j n NMAE Time 

100 7172 07172 KT 2 " 25 20 295T4 32l>8 01627 26.73 

1000 71152 0.7115 10" 2 100 85 859.27 48.04 0.2008 808.52 

1000 71152 0.7115 10" 4 100 90 859.46 44.62 0.2101 778.56 

2000 140691 0.7035 10" 4 200 100 1.1518e+3 63.52 0.1564 1.1345e+3 



We also used two data sets of DNA microarrays from 13511 . These data sets are available on the website http://cellcycle-www.stanfor 
The first microarray data set is a matrix that represents the expression of 6178 genes in 14 experiments based on the 
Elutriation data set in I35I1 . The second microarray data set is based on the Cdcl5 data set in I35I1 . and represents 
the expression of 6178 genes in 24 experiments. However, some entries in these two matrices are missing. For eval- 
uating our algorithms, we created complete matrices by deleted all rows containing missing values. This is similar 
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20 40 60 



Fig. 1 Distribution of the singular values of the recovered matrices for the Jester data set using FPC1. Left:100 users, Middle: 1000 
users, Right: 2000 users 





Fig. 2 Distribution of the singular values of the recovered matrices for the Jester data set using FPCA. Upper Left: 100 users, Ek, = 



10- 



= 25; Upper Right: 1000 users, e ks = 10~ 2 ,c s = 100; Bottom Left: 1000 users, e ks = 10" 



e ks = 10-V., = 200 



100; Bottom Right: 2000 users, 



to how the DNA microarray data set was preprocessed in 14 lh . The resulting complete matrix for the Elutriation 
data set was 5766 x 14. The complete matrix for the Cdcl5 data set was 4381 x 24. We must point out that these 
DNA microarray matrices are neither low-rank nor even approximately low-rank although such claims have been 
made in some papers. The distributions of the singular values of these two matrices are shown in Figure [3] From 
this figure we can see that in each microarray matrix, only one singular value is close to zero, while the others are 
far away from zero. Thus there is no way to claim that the rank of the Elutriation matrix is less than 13, or the 
rank of the Cdcl5 matrix is less than 23. Since these matrices are not low -rank, we cannot expect our algorithms to 
recover these matrices by sampling only a small portion of their entries. Thus we needed to further modify the data 
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Fig. 3 Distribution of the singular values of the matrices in the original DNA microarray data sets. Left: Elutriation matrix; Right: 
Cdcl5 matrix. 



sets to yield low-rank matrices. Specifically, we used the best rank-2 approximation to the Elutriation matrix as the 
new complete Elutriation matrix and the best rank-5 approximation to the Cdcl5 matrix as the new complete Cdcl5 
matrix. The numerical results for FPCA for recovering these two matrices are presented in Table [lOl In the FPCA 
algorithm, we set = 10~ 2 and xtol — 10 -6 . For the Elutriation matrix, we set c s = 1 15 and for the Cdcl5 matrix, 
we set c s = 88. The observed entries were randomly sampled. From Table [lOl we can see that by taking 60% of the 
entries of the matrices, our FPCA algorithm can recover these matrices very well, yielding relative errors as low as 
10~ 5 and 10~ 6 , which is promising for practical use. 



Table 10 Numerical results of FPCA for DNA microarray data sets 



Matrix 


m 


n 


P 


rank 


SR 


FR 


rel.err 


Time 


Elutriation 


5766 


14 


48434 


2 


0.6 


0.2386 


1.83e-5 


218.01 


Cdcl5 


4381 


24 


63086 


5 


0.6 


0.3487 


7.95e-6 


189.32 



To graphically illustrate the effectiveness of FPCA, we applied it to image inpainting |3J]. Grayscale images and 
color images can be expressed as matrices and tensors, respectively. In grayscale image inpainting, the grayscale 
value of some of the pixels of the image are missing, and we want to fill in these missing values. If the image is 
of low-rank, or of numerical low-rank, we can solve the image inpainting problem as a matrix completion problem 
( 11.2b . In our test we applied SVD to the 512 x 512 image in Figure @la), and truncated this decomposition to get 
the rank-40 image which is shown in Figure @|b). Figure |4fc) is a masked version of the image in Figure Ufa), 
where one half of the pixels in Figure |4ja) were masked uniformly at random. Figure |4jd) is the image obtained 
from Figure|4jc) by applying FPCA. Figure|4td) is a low-rank approximation to Figure|4ta) with a relative error of 
8.41e — 2. Figure |4]e) is a masked version of the image in Figure Ub), where one half of the pixels in Figure |4jb) 
were masked uniformly at random. Figure |4]T) is the image obtained from Figure |4]e) by applying FPCA. Figure 
|4ff) is an approximation to Figure |4tb) with a relative error of 3.61e — 2. Figure |4£g) is another masked image 
obtained from Figure |4jb), where 4 percent of the pixels were masked in a non-random fashion. Figure |4fh) is the 
image obtained from Figure |4jg) by applying FPCA. Figure |4jg) is an approximation to Figure |4jb) with a relative 
error of 1.70e-2. 
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50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 



(g) (h) 

Fig. 4 (a): Original 512x512 image with full rank; (b): Original image truncated to be of rank 40; (c): 50% randomly masked original 
image; (d): Recovered image from 50% randomly masked original image (rel.err = 8.41e — 2); (e): 50% randomly masked rank 40 
image; (f): Recovered image from 50% randomly masked rank 40 image (rel.err = 3.61e — 2); (g): Deterministically masked rank 40 
image (SR = 0.96); (h): Recovered image from deterministically masked rank 40 image {rel.err = l.lQe — 2). 
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8 Conclusions and discussions 

In this paper, we derived a fixed point continuation algorithm and a Bregman iterative algorithm for solving the 
linearly constrained nuclear norm minimization problem, which is a convex relaxation of the NP-hard linearly 
constrained matrix rank minimization problem. The convergence of the fixed point iterative scheme was established. 
By adopting an approximate SVD technique, we obtained a very powerful algorithm (FPCA) for the matrix rank 
minimization problem. On matrix completion problems, FPCA greatly outperforms SDP solvers such as SDPT3 in 
both speed and recoverability of low-rank matrices. Further study is needed to prove the convergence of algorithm 
FPCA. 

Acknowledgements We would like to thank two anonymous referees for their helpful comments. The first author thanks Junzhou 
Huang from Rutgers University for fruitful discussions on image inpainting. 
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